Seeking Stable Clusters in the Blogosphere
نویسندگان
چکیده
The popularity of blogs has been increasing dramatically over the last couple of years. As topics evolve in the blogosphere, keywords align together and form the heart of various stories. Intuitively we expect that in certain contexts when there is a lot of discussion on a specific topic or event a set of keywords will be correlated: the keywords in the set will frequently appear together (pair-wise or in conjunction) forming a cluster. Note that such keyword clusters are temporal (associated with specific time periods) and transient. As topics recede, associated keyword clusters dissolve, because their keywords no longer appear frequently together. In this paper, we formalize this intuition and present efficient algorithms to identify keyword clusters in large collections of blog posts for specific temporal intervals. We then formalize problems related to the temporal properties of such clusters. In particular, we present efficient algorithms to identify clusters that persist over time. Given the vast amounts of data involved, we present algorithms that are fast (can efficiently process millions of blogs with multiple millions of posts) and take special care to make them efficiently realizable in secondary storage. Although we instantiate our techniques in the context of blogs, our methodology is generic enough to apply equally well on any temporally ordered text source. We present the results of an experimental study using both real and synthetic data sets, demonstrating the efficiency of our algorithms, both in terms of performance and in terms of the quality of the keyword clusters and associated temporal properties we identify.
منابع مشابه
Overview of the TREC 2009 Blog Track
The Blog track explores the information seeking behaviour in the blogosphere. Thus far, since its inception in 2006 [9], the Blog track addressed two main search tasks based on the analysis of a commercial blog search engine: the opinion-finding task (i.e. “What do people think about X?”) and the blog distillation task (i.e. “Find me a blog with a principal, recurring interest in X.”). In TREC ...
متن کاملIdentifying and Ranking Topic Clusters in the Blogosphere
The blogosphere is a huge collaboratively constructed resource containing diverse and rich information. This diversity and richness presents a significant research challenge to the Information Retrieval community. This paper addresses this challenge by proposing a method for identification of “topic clusters” within the blogosphere where topic clusters represent the concept of grouping together...
متن کاملThe Blogosphere at a Glance—Content-Based Structures Made Simple
A network representation based on a basic wordoverlap similarity measure between blogs is introduced. The simplicity of the representation renders it computationally tractable, transparent and insensitive to representation-dependent artifacts. Using Swedish blog data, we demonstrate that the representation, in spite of its simplicity, manages to capture important structural properties of the co...
متن کاملSpontaneous Adsorption, and Selective Sensing of CO, and CO2 Greenhouse Gaseous Species by the more Stable Forms of N4B4 Clusters
Carbon oxide gaseous species are potentially considered as pollutants of the atmosphere of earth; especially, carbon monoxide and carbon dioxide which are of the well-known carbon oxids, play an effective role in the greenhouse gas emissions. Moreover, these species could initiate or handle some chain reactions in the troposphere that lead to emergence of some secondly air pollutants which may ...
متن کاملSearching for “Familiar Strangers” on Blogosphere: Problems and Challenges
In this work, we examine familiar strangers on Blogosphere and issues of finding them. In our daily life, familiar strangers, as coined by Stanley Milgram, do not know each other, but frequently exhibit some common patterns. Blogosphere is a part of the Web where bloggers post in individual or community blog sites. The nature of the Web is a scale-free network, which determines that a power law...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007